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PREFACE 


The  David  W.  Taylor  Lectures  were  initiated  as  a living  memorial  to 
our  founder,  in  recognition  of  his  many  contributions  to  the  science  of 
naval  architecture  and  naval  hydromechanics.  His  systematic  investiga- 
tion of  resistance  of  ship  hulls  is  universally  known  and  used,  but  of 
equal  importance  was  his  use  of  hydrodynamic  theory  to  solve  practical 
problems.  Many  of  the  experimental  techniques  which  he  pioneered  are 
still  in  use  today  (for  example,  the  use  of  a spherical  pitot  tube  for 
exploring  the  structure  of  a wake  field).  The  system  of  mathematical 
lines  developed  by  Taylor  was  used  to  develop  many  designs  for  the  Navy 
long  before  the  computer  was  invented.  And  perhaps  most  important  of 
all,  he  established  a tradition  of  applied  scientific  research  at  the 
**Model  Basin”  which  has  been  carefully  nurtured  through  the  decades,  and 
which  we  treasure  and  protect  today. 

These  lectures  were  conceived  to  support  and  strengthen  this 
tradition.  We  will  invite  eminent  scientists  in  fields  closely  related 
to  the  Center’s  work  to  spend  a few  weeks  with  us,  to  consult  with  and 
advise  our  working  staff,  and  to  give  lectures  on  subjects  of  current 
interest. 

It  is  most  fitting  that  Professor  Reinier  Timman,  mathematician  and 
philosopher,  initiate  this  series.  He  has  long  been  a friend  and  on 
several  occasions  has  used  the  Center  for  a retreat,  to  his  benefit  and 
ours.  He  has  inspired  and  advised  our  staff  and  cooperated  in  our  work. 
His  students  at  Delft  have  made  leading  contributions  to  the  development 
of  modern  naval  hydrodynamics.  Professor  Timman ’s  belief  that  mathe- 
matics can  contribute  powerfully  to  our  technology  is  much  in  the  David 
Taylor  tradition.  We  are  honored  that  he  agreed  to  give  the  first  in 
this  David  W.  Taylor  Lecture  Series. 

W.  E.  CUMMINS 
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FOREWORD 


It  is  great  honor  to  me  to  be  invited  to  give  the  first 
in  the  series  of  David  W.  Taylor  Lectures,  My  associations 
with  the  Model  Basin  date  from  a long  time  ago,  and  a visit 
to  the  United  States  is  for  me  not  a real  visit  unless  I 
have  the  opportunity  to  taste  once  more  the  stimulating 
atmosphere  which  not  only  gives  the  Model  Basin  an  out 
standing  place  in  hydrodynamical  research  but  also  acts  as 
a breeding  ground  where  nearly  all  outstanding  people  in  the 
field  passed  an  essential  period  in  their  lives.  So  I am 
extremely  grateful  to  have  been  given  the  opportunity  once 
more  to  spend  some  time  at  this  most  interesting  place  and 
to  participate  in  its  work.  I wish  to  express  my  gratitude 
to  Justin  McCarthy  who  originated  the  idea  of  the  lectures 
and  to  all  other  friends  who  made  this  period  a success. 

In  particular,  I am  pleased  that  Dr.  Langan,  whom  I used  to 
know  as  a promising  undergraduate  student,  did  a fine  job 
in  editing  the  lectures. 


R.  TIMMAN 
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ABSTRACT 

The  lectures  present  an  introduction  to  modern  control 
theory.  Calculus  of  variations  is  used  to  study  the  problem 
of  determining  the  optimal  control  for  a deterministic  sys- 
tem without  constraints  and  for  one  with  constraints.  The 
method  of  dynamic  programming  is  also  used  to  solve  the 
unconstrained  control  problem.  Stochastic  systems  are  intro- 
duced, and  the  Kalman-Bucy  filter  is  derived. 

INTRODUCTION 

Optimal  control  theory  is  involved  with  the  great  human  effort  to 
control  or  influence  processes  of  one  type  or  another.  The  objectives 
and  criteria  for  the  performance  of  a physical  system  may  be  diffused  or 
defy  tractable  analysis  in  many  situations,  but  the  basic  concepts  on 
which  to  proceed  have  been  established  in  control  theory.  One  first 
considers  a system  and  a process  through  which  the  state  of  the  system 
is  changing  in  time;  in  other  words,  some  action  or  motion  of  the  system 
takes  place  in  time.  This  behavior  of  the  system  is  described  by  a set 
of  time-dependent  variables  x (t)  = (x^,  . . . , x^)  which  are  called 
the  state  variables.  In  addition  to  the  state  of  the  system,  one  also 
considers  controls  by  which  the  process  in  question  can  be  influenced. 
These  controls  are  represented  by  a set  of  variables  u (t)  = 

(u.t  (t),  . . . , u (t))  which  are  called  the  control  variables. 
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At  a certain  instant  in  time,  say  t^,  the  state  of  the  system  is 
known  to  be  x^.  If  an  analysis  of  the  system  is  to  be  performed,  a sys- 
tem of  equations  must  be  specified  which  predict  the  state  for  t > t^ 
and  for  a given  control  function  _u.  ’ These  equations  are  called  the 
dynamic  equations  for  the  system;  they  may  take  the  form  of  an  ordinary 
differential  equation 


X (t)  = JE  (t,  x^  1L) 
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or  a difference  equation 


*n+l  = f K’ 


n' 


They  might  even  take  the  form  of  an  integro-differential-dif ference 
equation  or  a time  delay  equation,  but  they  cannot  take  on  a form  such 
that,  the  solution  at  some  time  t^  is  dependent  on  the  solution  in  the 
future,  t > t^.  The  dynamic  equations  must  reflect  this  principle  of 
nonanticipation.  One  does  not  violate  this  principle  by  choosing  a 
control  in  anticipation  of  the  future  and  thus  influencing  the  future 
state  of  the  system  based  on  estimated  future  information;  in  fact,  the 
choice  of  such  a control  is  actually  based  on  the  history  of  the  state 
of  the  system  available  at  the  time  of  the  choice. 

If  no  further  specification  of  system  performance  is  given,  every 
control  function  which  yielded  a physical  realizable  state  of  the  system 
for  t > tp  would  be  a solution  to  the  control  problem.  One  can  have  a 
meaningful  control  problem  only  if  there  is  a desired  objective,  a goal 
to  be  achieved  by  the  process.  Moreover,  it  is  not  sufficient  merely  to 
have  a goal;  there  must  be  a control  by  which  this  goal  can  be  achieved. 
This  control  could  be  the  case  of  no  control,  f(t,  x,  ii)  = f(t,  x) ; 
however,  it  must  exist.  Since  it  is  not  the  purpose  of  these  notes  to 
delve  into  all  the  mathematical  problems,  it  will  be  assumed  that  there 
exists  at  least  one  control  by  which  the  objective  can  be  achieved.  It 
will  further  be  assumed  that  any  control  function  used  in  the  sequel 
yields  a unique  state  function  x (t)  with  x (t^)  = x^;  the  state  func- 
tion is  obtained  by  solving  the  dynamic  equations. 

In  general,  there  are  a number  of  controls  which  could  yield  the 
desired  system  state.  From  among  this  set  of  possible  controls,  one 
would  like  to  choose  the  ”best”  control  with  respect  to  some  performance 
criterion.  For  example,  one  would  like  to  choose  the  control  so  that 
the  process  is  carried  out  with  a minimum  cost  in  fuel,  or  time,  or 
money.  In  the  sequel,  it  is  assumed  that  the  performance  criterion  can 
be  expressed  in  terms  of  a cost  function;  furthermore,  it  is  assumed 
that  the  cost  function  is  additive  with  respect  to  the  contribution  from 
each  time  interval.  An  example  of  such  a cost  function  is 
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G(x^,  T)  + J F(a,  X,  it)  da 

where  = 2t(T) . This  cost  function  is  dependent  on  the  final  state  of 
the  system  through  the  function  G and  on  the  intermediate  states  and  the 
control  function  through  the  function  F.  The  additive  property  of  the 
control  function  with  respect  to  the  intermediate  times  is  represented 
by  the  integral.  By  an  optimal  control  is  meant  that  control  which 
minimizes  the  cost  function;  it  is  this  function  which  is  the  desired 
result  of  optimal  control  theory. 

Any  process  that  is  being  controlled  is  subject  to  unpredicted 
disturbances,  and  these  can  make  a significant  difference  in  the  choice 
of  a control  function.  Suppose  the  dynamic  equations  of  a system  is 
given  by  the  differential  equation 

^ =u-fp(t) 

where  p(t)  represents  a disturbance.  The  behavior  of  the  system  in 
response  to  the  two  different  controls  (u^^  = - x)  and  (u2  = - e does 
not  differ  if  there  is  no  disturbance  (p  = 0) ; however,  if  a disturbance 
is  present,  the  response  is  significantly  different.  If  Xq  « 1,  the 
response  to  the  first  control  is  given  by 

X,  = e^  + e^  |*  e^p(a)  da 
•'0 


whereas  the  response  to  the  second  control  is 

r ^ 

X-  = e *"  + p(a)  da 

^ -'o 

Such  differences  could  conceivably  result  in  a different  choice  for  an 
optimal  control. 
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In  analyzing  systems  and  their  control,  one  must  find  a way  to 
represent  the  unpredictable  disturbances.  Such  disturbances  cannot  be 
modeled  by  analytic  functions  since  the  value  of  an  analytic  function  at 
any  point  is  predictable  from  its  value  on  an  arbitrary  short  interval. 

One  answer  to  modeling  these  disturbances  is  to  describe  them  as  stochastic 
processes.^  The  theory  of  such  processes  was  developed  to  model  the 
fluctuation  observed  in  physical  systems.  Wiener  processes  or  the 
Brownian  motion  process  are  of  particular  interest  to  the  stochastic 
control  problem;  many  of  the  disturbances  that  affect  a control  system 
can  be  modeled  by  processes  generated  from  Wiener  processes.  A Wiener 
process  is  a stochastic  process  in  which  the  statistical  properties  over 
the  interval  (t,  t+r)  are  the  same  as  those  over  the  interval  (s,  s+t); 
moreover,  the  behavior  of  the  process  is  independent  over  time  intervals 
which  do  not  overlap,  and  there  is  no  trend  in  the  behavior. 

Once  the  stochastic  disturbances  have  been  introduced  into  the 

control  theory,  the  problem  is  no  longer  deterministic.  The  state 

variables  and  control  variables  are  no  longer  predictable  but  must  be 

2 

described  by  their  statistical  properties.  Kalman  and  Bucy  provide  a 
solution  to  the  stochastic  control  problem  for  nonstationary  linear 
systems.  Their  solution  consists  of  using  an  optimal  filter  to  estimate 
from  the  observed  system  performance  the  state  of  the  system  in  terms  of 
the  conditional  mean;  the  estimated  state  is  fed  back  to  the  control 
signal  through  linear  feedback.  The  linear  feedback  is  determined  by 
solving  a deterministic  control  problem;  the  filter  depends  on  the 
disturbances  and  on  the  system  dynamics,  but  it  is  independent  of  the 
cost.  Although  the  nonlinear  stochastic  control  problem  or  its  equiva- 
lent, the  nonlinear  filter  problem,  has  not  been  solved,  some  headway 

3 

has  been  made  by  Bucy  and  Joseph;  this  lecture  considers  only  the 
linear  problem. 

^Astrom,  K.  J.,  ‘^Introduction  to  Stochastic  Control  Theory,"  Academic 
Press,  Inc.,  New  York  (1970). 

2 

Kalman,  R.  E.  and  R.  S.  Bucy,  "New  Results  in  Linear  Filtering  and 
Prediction  Theory,"  Journal  of  Basic  Engineering  Series  D,  American 
Society  of  Mechanical  Engineers,  Vol.  83,  pp.  95 — 108  (1961). 

3 

Bucy,  R.  S.  and  P.  D.  Joseph,  "Filtering  for  Stochastic  Processes  with 
Applications  to  Guidance,"  Interscience  Publishers,  Inc.,  New  York  (1968). 
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As  an  example  of  a control  problem,  consider  a ship  moving  through 
a current  of  water;  the  ship  is  a system  undergoing  a change  in  state. 

In  this  example,  the  state  is  the  position  (x,  y)  of  the  ship.  The 
parameters  which  control  the  motion  of  the  ship  are  the  power,  which 
determines  the  velocity  relative  to  the  water,  and  the  steerage  angle, 
which  controls  the  heading  angle  0.  In  this  simplification  of  the 
system,  the  dynamic  equations  are: 

X = V cos  0 + u(x,  y) 

y = V sin  0 + v(x,  y) 

where  u and  v are  the  velocity  of  the  current  in  the  x-  and  y-directions , 
respectively.  The  goal  might  be  to  go  from  point  A to  point  B.  If  it 
is  desired  to  reach  B in  the  shortest  possible  time,  the  cost  function 
would  be  the  accumulated  time;  if  it  is  desired  to  reach  B with  the 
minimum  expenditure  in  fuel,  the  cost  function  would  give  the  expended 
fuel  in  terms  of  x,  y,  V,  and  0.  A more  complicated  cost  function  would 
result  if  it  is  desired  to  reach  B in  the  least  time  with  a reasonable 
expenditure  of  fuel.  Both  the  power  and  steerage  angle  could  be  subject 
to  unpredictable  perturbations;  there  could  also  be  a stochastic  pertur- 
bation of  the  current. 

This  lecture  on  control  theory  first  treats  a deterministic  optimal 
control  problem  with  no  constraints  on  the  controls.  It  is  first  solved 
by  transforming  the  problem  into  a boundary-value  problem  for  an  ordin- 
ary differential  equation,  the  so  called  indirect  approach;  it  is  then 

solved  by  the  direct  method  developed  by  Bellman,  the  method  of  dynamic 

4 

programming.  The  big  contribution  of  modern  control  theory  to  the  de- 
terministic control  problem  has  been  the  extensions  to  controls  with 
constraints,  and  a discussion  of  constrained  controls  constitutes  an- 
other major  topic  of  the  lecture.  Still  another  important  area  is  the 
development  of  the  theory  of  stochastic  processes  necessary  in  the 


^Bellman,  R.  E.  and  S.  E.  Dreyfus,  **Applied  Dynamic  Programming,” 
Princeton  University  Press,  Princeton,  N.J.  (1962). 
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treatment  of  stochastic  controls.  Finally,  the  theory  of  Kalman-Bucy 
filters  is  given  and  their  solution  to  the  stochastic  control  problem  is 
presented  for  linear  systems. 

THE  OPTIMAL  CONTROL  PROBLEM 

In  these  lectures  the  simplest  optimal  control  problem  considered 
is  that  of  a state  variable  x(t)  and  a control  variable  u(t)  defined  on 
an  interval  0£t£T.  The  process  being  controlled  is  described  by  the 
dynamic  equations 


x(t)  = f^(t,  X,  u) 


(1.1) 


with 


x(0)  = Xq  (1.2) 

The  vector  is  twice  continuously  differentiable  with  respect  to  x and 
Lipschitz  continuous  with  respect  to  u;  this  latter  condition  means 
simply  that  there  is  a constant  L such  that  for  every  pair  of  control 
vectors  u and  v 


|^(t,  X.  u)  - f.(t,  X,  v)|  £ L|u  - v|  (1.3) 

For  each  control  vector  Uy  these  conditions  imply  that  the  state 
vector  Xy  which  is  obtained  from  solving  (1.1)  and  which  also  satisfies 
the  initial  condition  (1.2),  exists  and  is  unique.  Moreover,  from  among 
the  set  of  control  vectors,  it  is  assumed  that  there  is  a unique  control 
u which  minimizes  the  cost  function  C^.  The  cost  function  is  defined  by 
the  following: 


G(x^,  T)  + 


F (a , 2L»  u.)  da 


(l.A) 
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The  functions  F is  twice  continuously  differentiable  with  respect  to  x 
and  Lipschitz  continuous  with  respect  to  vi;  G represents  the  cost  at  the 
terminal  point  x(T)  = x^;  it  is  twice  continuously  differentiable  with 
respect  to  X^. 

Suppose  that  v is  an  optimal  control  vector,  and  consider  a slight 
deviation  6u  of  this  control  vector.  If 

ji(t)  = X 

u(t)  is  also  a control  vector,  as  can  be  seen  from  an  application  of  the 
theory  of  ordinary  differential  equations.  If  ^ is  the  state  vector 
associated  with  the  control  v,  the  new  control  u yields  a new  state 
vector  X given  by 


x(t)  = ^ + 65c 


where  6x  is  an  unknown.  Moreover,  since  ^ minimizes  the  cost  function, 
the  new  cost  function  is  greater; 


F(a,  X.  u)  do  + G(x^,T) 


F(o,  v)  do  + G(^^,T) 


(1.5) 


Since  the  old  state  vector  satisfies 


Z = f(t,  z,  v) 


and  the  new  one  satisfies 


Now  by  assumption  ^ is  twice  continuously  differentiable  with  respect  to 
x;  hence 


6x  = f.(t,  _z  -f  6x,  ^ + 6u)  - f^(t,  v) 

= ^ ^x  + _f  (t,  z,  + 6_u)  - f^(t,  v)  + 0(|^x|^)  (1.6) 

It  is  not  necessary  that  6u  be  uniformly  small;  indeed,  in  problems 
involving  bang-bang  controls,  this  is  not  at  all  true.  However,  there 
can  be  deviations  6_u  of  order  one  only  if  their  duration  is  short.  It 
can  be  proved  that  if  6u  satisfies  the  condition 

f |6u(a)|  da  < e (1.7) 

then  the  deviations  62c(t)  are  also  of  order  c.  Since  by  assumption,  ^ 
is  Lipschitz  continuous  with  respect  to  u, 

|^(t,  u)  - f(t,  v)  1 1 L|u  - v|  = 0(6u) 

Moreover,  it  follows  from  Equation  (1.6)  that  to  the  same  order  of 
approximation 


6^  = f 6 X + _f(t,  _z,  _u)  - f(t,  _z^,  y^)  (1-8) 


or  in  abbreviated  form 


This  equation  is  a linear  differential  equation  for  6x,  and  there 
are  standard  ways  for  solving  linear  differential  equations.^  One  first 
considers  the  linear  homogeneous  equation 

(1.10) 

where  in  our  case  ^ represents  the  vector  6x  and  A the  matrix  Let 


X^(t)  = T),  

be  the  solution  of  Equation  (1.10)  with  t)  = > the  Kronecker 

delta;  moreover,  let  ^(t,  t)  be  the  matrix  whose  column  vectors  are  the 
vectors  *I*(t,  t)  = t)  . The  matrix  $(t,  t)  is  called  the 

transport  matrix  or  fundamental  matrix  for  the  differential  Equation 
(1.10).  From  (1.10)  it  follows  that  as  a function  of  t 

II  (t,  T)  = A<D(t,  T)  (1.11) 


and  by  its  definition 


<I>(T,  T)  = I (1.12) 

where  I is  the  unit  matrix.  The  solution  ^(t)  is  given  in  terms  of  its 
value  at  t ” T by 

X(t)  = «I>(t,  T)  ii(T)  (1.13) 


^Coddington,  E.  A.  and  N.  Levinson,  "Theory  of  Ordinary  Differential 
Equations,"  McGraw-Hill  Book  Company,  Inc.,  New  York  (1955). 
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Hence 


I x.(^)  = t)  y(T) 

= $(t,  T)  $(T,  t)  y(t) 


or  if  X 0. 


I = <D(t,  T)  4>(t,  t)  (1.14) 

Differentiating  with  respect  to  t yields 

0 = II  (t,  T)  <I>(T.  t)  + <I>(t,  T)  1^  <I>(T,  t) 

= A $(t,  t)  4)(t,  t)  + 4>(t,  t)  I^  <I>(t,  t) 

= A(t)  + <J)(t,  T)  1^  $(T.  t) 

It  can  be  shown  that  $(t,  t)  has  an  inverse  and  that  this  inverse  is 
$(t,  t);  consequently 


1^  $(T,  t)  = - $ ^(t,  t)  A(t)  = - 4>(t,  t)  A(t) 
that  is,  ^(t,  t)  as  a function  of  T satisfies 

^ <D(t.  T)  = - 4>(t,  T)  A(t)  (1.15) 

Although  (1.15)  will  be  used  subsequently,  of  immediate  interest  is  the 
solution  to  the  inhomogeneous  linear  equation 

i.  = A ^ + ^(t)  (1.16) 
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with  ^(t)  = 0;  the  solution  is  given  by 


J' 


Z(t)  = J o)  g(a)  da 

T 


(1.17) 


which  can  be  verified  by  substitution  into  (1,16)*  For  the  control 
problem,  (1.17)  has  two  consequences:  it  can  be  used  in  conjunction 

with  (1.6)  to  obtain  an  estimate  for  the  order  of  magnitude  of  6x  and 
it  can  be  used  to  solve  (1.9). 

In  the  first  case, 

t 

|6x|  ‘ 


< J |<I>(t,  a)||  f(u)  - f(v)|  da  + J 0(6x^)  da 
0 0 


_<  M J I f(u)  - f(v)|da+  J 0(6x^)  da 


where  M is  a bound  for  From  (1.3) 


l^2il  1 LM  j |6u|  da  + J 0(6x^) 


da 


By  iteration 


. fo, 


< LMe  + j 0(6x  ) da 
0 


|6x|  < LMe  + J o(e^)  da  = 0(e) 


0 

The  second  case  is  of  more  interest,  of  course,  for  it  gives  an 
approximation  of  6x  good  to  the  second  order  in  e,  namely. 


6x 


. r-. 


0)  [f.(u)  - _f(v)]  da 


(1.18) 
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where  <I>  is  defined  by 


31> 


(t,  T)  = f^(t)  <t(t,  t) 


(1.19) 

Now  consider  the  difference  in  the  values  of  the  cost  function;  by 


(1.5) 


r 


F(a,  X,  ^)  - F(a,  v)  do  + G(x  , T)  - G(z  , T)  ^ 0 


Hence,  from  the  assumptions  on  F and  G, 

<.T 


[F  6x  + F(u)  - F(v)]  do  + G 6x(T)  > 0 

•'.v  X X 


By  (1.18), 
,T 


J [F  (T)  J <I>(T,  O)  (f(u(0))  - f(v(0)))d0  + F(u(t))  - F(v(t))]  dX 
0 ^ 0 


+ G $(T,  o)  (i(u(o))  - HHa)))  do  > 0 (1.20) 

i) 

If  the  order  of  integration  in  the  double  integral  is  changed, 

-T 


J F^(t)  j <1>(t. 


o)  [f(u(o))  - f(v(o))  dOdX 


0 0 
-T  ,T 


= F (T)$(T,  O)  dr  [f(u(o))  - f(v(o))]  do  (1.21) 

•b  -o 


The  vector  function  is  defined  by 


T 

£ (t) 


= - J 


F (X)  ^(X,  t)  dx  - G (T)  <D(T,  t) 

X X 


Recall  that  one  of  the  properties  of  $ was  (1.15) 


(1.22) 


(X,  t)  = - <D(x,  t)  f^(t) 
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Then 


• T 
£ 


F^(t)  <t(t,  t)  - I F^(t)  $(t,  t)  dx  - G^(T)  1^  <I>(T.  t) 


Fx(t)  -H 


< X 


(T)  $(T.  t)  f (t)  dx  + G^(T)<D(T,  t)  f (t) 

^ X X 


Fx(t)  - 


(X)  $(X,  t)  dx  - G^CT)  <I>(T,  t) 


f (t) 

X 


.T 

£ 


F - p f 

X — X 


(1.23) 


T T 

with  2,  (1)  “ terms  of  p , (1.20)  becomes 


J 


[-  P^(a)  (f_(u)  - f(v))  + F(u)  - F(v)]  da  > 0 


or 


I 


[-  F(v)  + f.(v)  ] - [-  F(£)  + p^  !.(£)]  do  ^ 0 (1.24) 


Since  6u  is  an  arbitrary  deviation  satisfying  only  (1.7),  it  can  be 
chosen  such  that  £ = X everywhere  except  on  some  arbitrary  interval;  as 
a consequence,  the  inequality  in  (1.24)  must  hold  for  the  integrand: 

- F (v)  + p^  f.(v)  i ^ (£^  X(£^ 

Define 

H(t,  u)  = - F(u)  + f^(ju)  (1.25) 


Then  H satisfies 


H(t,  v)  > H(t,  u) 


(1.26) 
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for  V,  an  optimal  control.  This  is  the  Pontryagin  maximal  principle 

T 

which  states  that  for  given  values  of  £ and  x at  time  t,  the  optimal 
control  v(t)  is  the  control  function  for  which  the  Hamiltonian  H(t,  u) 
is  a maximum. 

If  the  control  functions  are  sufficiently  smooth,  the  optimal 
control  is  that  control  for  which 


T 

F + p f 
u u 


(1.27) 


It  is  assumed  that  f is  differentiable  with  respect  to  _u;  prior  to  this 
equation,  f need  only  be  Lipschitz  continuous  with  respect  to  u.  This 
equation  is  a system  of  m equations  which  could  be  solved  for  the 
m control  functions  (u, ,...,u  ) in  terms  of  the  state  variables 

1 m T T 

(x, ,...,x  ) and  the  new  variables  (p,,...,p  ).  Consequently,  the 
In  In 

optimal  control  problem  has  been  reduced  to  a two-point,  boundary-value 
problem  for  an  ordinary  differential  equation: 


or 


X = f(t,  x»  u.) 


f 

X 


0 = - F 


u 


f 

u 


0 


9u 


(1.28) 


lA 


where 


x(0)  = Xq 

p(T)  = G^(T)  (1.29) 

T 

There  are  just  enough  conditions  to  determine  Xy  £ , and  u. 

T 

The  function  H contains  the  variables  2.  > course  t. 

Using  (1.26)  to  eliminate  u,  (1.28)  can  be  expressed  in  terms  of  the  set 

T 

of  dual  variables  x and  £*  = £ , where  the  prime  denotes  transpose  of 
the  vector;  the  resulting  system  is  the  familiar  canonical  form  of 
classical  mechanics. 


X = 


3p 


£ = 


3x 


(1.30) 


The  boundary  conditions  are  stated  in  terms  of  2Sq>  T;  for 

instance,  both  x.j>  and  T might  be  fixed,  or  either  one  might  vary  while 
the  other  is  fixed.  No  boundary  conditions  are  specified  directly  in 
terms  of  2.I  the  boundary  conditions  on  are  obtained  indirectly  by 
substitution  into  (1.29).  Equation  (1.29)  does,  however,  contain  a 
sufficient  set  of  conditions  to  pose  a two-point,  boundary-value  problem 
for  (1.30). 

Another  form  that  the  boundary  condition  at  t = T might  assume  is 
for  3^  and  T to  satisfy  an  end  condition  of  the  form 

M(x^,  T)  = 0 (1.31) 

where  M is  a twice  continuously  differentiable  vector  function  of  both 
its  arguments.  In  this  case  the  method  of  Lagrange  multipliers  will  be 
used  to  transform  the  optimal  control  problem  into  a corresponding 
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two-point,  boundary-value  problem.  The  vector  ^ is  introduced  here  as  a 
Lagrange  variable.  Now  the  problem  of  minimizing  the  cost  function  (14) 
is  replaced  by  the  problem  of  finding  the  unconstrained  minimum  of 


Cq(a) 


F(a,  X,  u)  + £’(x  - ^(a. 


X, 


u)  da 


+ T)  + G(x^,  T)  (1.32) 


The  boundary  condition  (1.31)  has  been  inserted  into  the  cost  function 
by  means  of  the  Lagrange  multiplier  Suppose  ;v(t)  is  the  control 
which  minimizes  C . For  a variation  6u  to  the  control  v,  let  x(t) 
denote  the  new  state  variable,  and  let  t = T+AT  be  the  time  at  the  new 
terminal  point.  The  main  difference  from  the  previous  argument  in  this 
section  is  that  the  terminal  time  is  T+AT  rather  than  AT.  The  new  cost 
is  given  by 

T+AT 

C (^)  = F(a,  ^ + 6jc,  V + 6^)  + ^'  (^  + 6^  - f(a,  £ + 6jc,  _v  + 6ia)  da 

^ 0 

+ G(x(T  + AT),  T + AT)  + Jj’M(x(T  + AT) , T + AT) 


Hence  the  increase  in  cost  C^(u)  ~ given  as: 


C^(u)  - C^(v) 


f' 

V)  = [F  6x  + F 6u  + q’6x  - 

— < X — u— 


X - q’f  6x  - q’f  6u]  da 
- X — u — 


+ + G^AT  + y ’ (^Ax,j,  + ^AT) 

.T+AT 


^ / 


F(a,  Xy  h) 


where  x = z + 6x 


d6x  f ^ 

J q’  ^ dt  = q’(T)  6x(T)  - I £*  6x  da 
_ dt  - - •'o 
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Hence 


0 < 


+ £’ 


(T)6x(T)  + + G^T  + p'  (^Ax^  + Kj.AT) 


^ / 


T+AT 


F(a,  x»  u)  + £'x  - £'f(a,  X,  u)  - F(T, 

- a'x(T)  + 5.'f.(T,  x^,  u^)  da 


+ [F(T,  x^,  u^)  + £’z^  + £’6x(T)  - £'f(T,  x^  u^) ] AT  (1.33) 

The  integral  from  T to  T+AT  is  a second  order  contribution  which  goes  to 
zero  faster  than  the  other  terms  as  AT  0. 

In  order  to  determine  Ax^,  consider  the  solutions  of  the  differ- 
ential Equation  (1.1),  which  have  the  initial  value  2Sq*  These  solutions 
satisfy  the  integral  equation 


x(t)  = 2Sq  + J 1(«?.  i 


) da 


Then 


Ax_  = (x(T)  - z(T))  + 


/ 


AT 


f(a,  X,  u)  da 


= 6x(T)  + ^ AT  + O(e^) 

where  the  geometry  of  the  proof  is  illustrated  in  Figure  1. 
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To  within  second  order 


6x(T)  = Ax^  - f_  AT 


(1.34) 


Within  this  order  of  approximation,  (1.33)  reduces  to  the  following: 
0 < I [F  - 6'  - q'f  ] 6x  + [F  - q'f  ] 6u  do 


+ [^^'(T)  + + m’^]  a X, 


+ [-l’(T)  I + + F(T,  x^,  u^)]  AT 


If  £ is  now  determined  so  that  the  coefficient  of  6jc  vanishes, 


q'  = F - q'f 

— X — X 


This  is  the  same  differential  Equation  (1.23)  that  £ satisfied;  our 
Lagrange  multiplier  can  then  be  identified  with  £ 


£ = £ 


(1.35) 


Moreover,  since  the  relationship  must  hold  independent  of  6u, 


F - n'f  = 0 
u — u 


(1.36) 


Since  there  are  no  longer  restrictions  on  Ajc  and  AT, 


£' (T)  + £'M^  + = 0 


(1.37) 


F + G + £’M  - £’i  = 0 


Introducing  the  Hamiltonian  (1.25)  yields 
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X 


3p 


£ “ 


3x 


(1.28) 


The  initial  condition  x(0)  “ together  with  the  terminal  conditions 


provides  a sufficient  number  of  conditions  to  determine  cl,  and  T. 

The  last  two  equations  in  the  system  (1.38)  are  obtained  from  (1.37). 

The  problems  of  optimal  control  theory  generally  reduce  to  a two- 
point,  boundary-value  problem  for  the  system  of  ordinary  differential 
equations  (1.30).  Bailey,  Shampine,  and  Waltman^  discuss  methods  for 
solving  such  two-point,  boundary-value  problems.  These  problems  are 
presently  solved  either  by  the  shooting  method  or  by  solving  a sequence 
of  simpler  boundary  value  problems  whose  solutions  converge  to  a solu- 
tion of  the  given  problem.  In  any  case,  very  few  of  these  problems  can 
be  solved  without  the  use  of  electronic  computers  either  digital  or 
hybrid. 

The  shooting  method  is  the  easier,  when  it  works.  It  consists  of 
supplementing  the  conditions  at  one  end  with  a sufficient  number  of 
assumed  conditions  to  yield  an  initial  value  problem.  The  initial  value 


^Bailey,  P.  B. , et  al.,  "Nonlinear  Two-Point,  Boundary-Value  Problems," 
Academic  Press,  Inc.,  New  York  (1968). 


M(x^,  T)  * 0 


£*  (T)  = - (y  + G^) 


(1.38) 


H(T,  u(T))  = 
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problem  is  solved;  the  solution  is  substituted  into  the  boundary  con- 
ditions at  the  other  end.  If  these  conditions  are  satisfied,  the  solu- 
tion to  the  initial  value  problem  is  the  desired  solution  to  the  two- 
point,  boundary-value  problem;  otherwise,  a new  set  of  assumptions  is 
made  based  on  the  discrepancy  between  the  actual  boundary  values  and  the 
calculated  values.  Hopefully,  as  one  continues  this  iteration  process, 
the  solutions  to  the  initial  value  problem  converge  to  a solution  of  the 
two-point,  boundary-value  problem.  The  shooting  method  may  not  con- 
verge, or  it  can  be  unstable,  that  is,  a small  variation  in  the  initial 
conditions  results  in  a large  variation  in  the  solution.  If  the  initial 
problem  is  unstable,  a small  error,  such  as  roundoff  on  a computer, 
could  cause  subsequently  computed  values  at  another  point  to  be  meaningless. 

Before  proceeding  to  the  direct  method  for  solving  the  optimal 

control  problem,  take  a second  look  at  the  Hamiltonian  H and  the  func- 
T 

tions  £ . Suppose  that  the  terminal  cost  G is  identically  zero;  the 
cost  function  is  then 


Further,  assume  that  every  point  in  an  open  neighborhood  N of  an  optimal 
trajectory  ^(t)  can  be  joined  to  the  initial  point  (0,  2Sq)  ^ trajec- 
tory x(t)  resulting  from  an  optimal  control.  This  assumption  makes  the 
minimal  cost  J a function  of  the  terminal  point  (T,  iri  N. 


0 


J(x^,  T)  = Min 


(1.39) 


0 


It  is  assumed  that  J is  twice  continuously  differentiable.  Then, 


J(x^  + Ax^,  T+AT)  = J(x^,  T)  + J^Ax^  -f  J^AT  (1.40) 
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By  the  definition  of  J,  there  is  a control  u.  + 6 u together  with  a 
trajectory  2i  such  that 


J(x^  + Ax^,  T+AT) 


F(o,  X + 6x,  u + 5u' 
(3 


5u)  do 


(1.41) 


where  u is  the  control  such  that 


f 


J(x  , T)  = J F(0,  X,  u)  do 
-T 


(1.42) 


From  (1.41)  and  (1.42) 


J(x^  + Ax^,  T+AT)  - J(x^,  T) 


J F(a,  X }±  ^ - F(o,  x>  u) 


0 

.T+AT 


F(o,  2i  + 'Sx,  ^ + 6^)  do 


T 

(F  6x  + F 6u)  dt  + FAT 
X — u — 


Now  from  (1.23),  the  equation  for _g  is 


Fx  = i’  + £’4 


Hence 


J(x^  + Ax^,  T+AT)  - J(x.j,,  T) 


I [(£'  + £’4)  + F^6u]  do  + 


FAT 


5x(T)  + j 


= £*  (T)  6x(T)  + J £'  (f  6x  - 6x)  + F^_6u  do  + FAT 

0 -V 
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From  (1.9) 


f 6x  - 6x  = - f 6u 

— X — — — U — ' 


Hence 

J(x^  + Ax^,  T+AT)  - J(x^,  T)  = P'(T)  (Ax^  - fAT) 

f' 

+ J (F  - o’f  ) 6u  da  + FAT 

0 u ^ -ti  - 

where  use  has  been  made  of  (1.34).  But  by  (1.27),  F^  “0;  so, 

J(x^  + Ax^,  T+AT)  - J(x^,  T)  = £'Ax^  + (F  - £*  (T)f ) AT 

= p'(T)  Ax^  - HAT 

= J Ax^  + J^AT 
X — T T 

where  the  last  equality  results  from  (1.40).  This  gives 


J = p' 

X ^ 


(1.43) 


and 


= - H (1.44) 

In  the  space  of  variables  (x^,  T) , the  vector  £*  is  the  gradient  of 
the  function  J;  it  is  normal  to  the  surfaces  of  constant  J;  H is  the 
Hamiltonian  of  the  function  J.  This  sheds  new  light  on  the  maximal 
principle.  Along  an  optimal  trajectory,  the  change  in  cost  J over  a 
given  time  step  AT  is  a minimum,  that  is,  H is  a maximum. 
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J 


'OPTIMAL 

TRAJECTORY 


I^AT— H 


Figure  2 — Constant  Cost  Fronts 


These  arguments  hold  only  if  the  terminal  cost  is  zero;  G = 0. 

RELATION  TO  DYNAMIC  PROGRAMMING 

The  partial  differential  equations  (1.43)  and  (1.44)  can  be  ob- 
tained by  the  method  of  dynamic  programming.  This  method  is  based  on 


ciple,  an  optimal  control  policy  has  the  property  that,  regardless  of 
the  initial  state  or  initial  decision,  the  remaining  decisions  must 
constitute  an  optimal  control  policy  with  regard  to  the  state  which 
results  from  the  first  decision. 

In  terms  of  the  cost  function 


the  Bellman  principle  takes  the  form. 

The  cost  C(jj)  is  a minimum  along  a curve  x defined  on  [0,  T]  if  it 
is  a minimum  along  each  later  part  of  the  curve,  that  is,  if 


the  Bellman  principle  of  optimality.^  According  to  the  Bellman  prin- 


0 


^Dreyfus,  S.  E.,  ’^Dynamic  Programming  and  the  Calculus  of  Variation,** 
Academic  Press,  Inc.,  New  York  (1965). 
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F(a,  X.  u)  da 


t 


is  a minimum  along  the  curve  x on  the  interval  (t,  T]  for  all  te[0,  T]. 

The  integral  is  dependent  on  the  end  point  (t,  x(t)).  If  one 
defines 

J(x,  t)  = min 
u 


F(a,  X,  u)  da 


(2.1) 


for  all  admisible  controls  u,  then 


J(x,  t)  = min  {F(t,  x,  u.)  6t)  + min 
u u 


t+6t 


F do 


or 


J(x,  t)  = min  {F(t,  x,  u)  6t  + J(x  + 6x,  t + 6t)}  (2.2) 

Ji 

This  equation  forms  the  basis  of  the  direct  methods  for  solving  control 

7 8 

problems,  described  by  Dreyfus.  Larson  extended  the  direct  methods  to 
constrained  problems. 

If  it  is  assumed  that  J has  partial  derivatives,  the  differential 

equations  (1.43)  and  (1.44)  can  be  obtained  from  (2.2).  Hence,  the 

boundary  value  problem  for  the  optimal  control  is  obtained.  If  the 

partial  derivatives  of  J exist,  the  right-hand  side  of  (2.2)  can  be 

expanded  in  a Taylor  series: 

J(x,  t)  = min  {F6t  + J(x,  t)  + ^ 

u 

(2.3) 

g 

Larson,  R.  E. , "State  Increment  Dynamic  Programming,"  American 
Elsevier  Publishing  Company,  Inc.,  New  York  (1968). 


24 


From  the  differential  equation  (1.1) 


Hence 

6x  = 

Since  6t  > 0, 

0 = min  {F  + J f + J } 6t 

tl 

u 

0 = min  {F  + J f + J } (2.4) 

t 

u 

In  order  to  find  the  minimum  of  the  term  in  brackets,  it  is  differ- 
entiated with  respect  to  u and  the  result  is  set  equal  to  zero.  This  is 
a necessary,  but  not  sufficient  condition;  however,  if  one  assumes  a 
minimum,  it  serves  the  purpose. 


F + J fu  = 0 (2.5) 

u X— 

By  (2.4), 

F-fJf+J  = 0 (2.6) 

x~  t 


Hence 

H = 0 

(1.43) 

J = -H  = -(F  + Jf) 
t X 


From  (1.25) 

J = (1.42) 

X 
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There  is  a difference  between  the  definition  of  H here  and  its 
definition  in  the  previous  section.  This  is  only  an  apparent  difference 
in  the  sign  of  F,  which  occurs  because  the  lower  limit  of  the  integral 
is  used  in  the  definition  of  J here  rather  than  the  upper  limit  as  used 
earlier.  Otherwise  there  is  complete  agreement  with  the  results  of  the 
indirect  method. 


In  most  applications,  the  control  or  the  state  variables  cannot  be 
chosen  arbitrarily  but  are  subject  to  constraints.  In  the  problem  of  a 
ship  moving  in  a current,  ship  speed  is  limited  by  the  maximum  power 
available.  The  constraints  can  generally  be  expressed  in  terms  of 
inequalities  of  the  form 


where  the  vector  inequality  simply  means  that  the  components  satisfy  the 
inequality.  The  number  of  components  in  the  vector  ^ is  the  number  of 
constraints  on  the  system.  The  analysis  does  not  depend  on  whether  both 
3c  and  u occur  implicitly  in  the  inequality;  one  can  have  constraints  on 
the  controls  and  not  on  the  state  of  the  system  or  vice  versa  without 
affecting  the  analysis. 

In  this  presentation,  the  variables  in  the  optimal  control  problem 
with  constraints  are  the  state  variable  2i(t)  and  the  control  variable 
]i(t)  defined  on  an  interval  0 ^ t £ T.  The  process  being  controlled  is 
described  by  the  dynamic  equation  (1.1): 


with  initial  condition  x = x ; the  state  and  control  variables  are 

— —c 

constrained  by  the  inequality  (3.1).  For  simplicity,  the  terminal  cost 
is  taken  as  zero,  G = 0,  and  the  cost  function  is  given  by  the  equation: 


CONSTRAINTS  ON  THE  CONTROL  AND  STATE  VARIABLES 


^(jc,  ii)  0 


(3.1) 


x(t)  = f(t,  X,  u) 


(3.2) 
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The  vector  and  the  cost  function  F are  twice  continuously  differentiable 
with  respect  to  x and  continuously  differentiable  with  respect  to  u. 

The  Lagrange  multipliers  will  be  used  here  to  reduce  this  problem 
to  a two-point,  boundary-value  problem.  As  in  (1.32),  the  differ- 
ential equation  is  introduced  into  the  cost  function  by  means  of  a 
Lagrange  multiplier  £. 

f' 

C (u)  = I F(a,  X,  u)  + £*  (x  - f(a,  X,  u))  da 
p-  •'o  -- 


which  yields  the  variational  equation 
.T 


1 

0 


(F  6 + F 6u  + o'  6x  - p'f  6x  - p'f  6u]  da 

x-x  u—  X—  u — 


£'  (T)6x(T) 


;x(T)  + J 


t - k'  - Ax 


+ (F  - p'f  ) 6u  da  > 0 

U —u  — — 


(3.3) 


The  differential  in  the  cost  is  greater  than  or  equal  to  zero  since 
it  is  assumed  that  the  variation  6u  is  around  an  optimal  control,  a 
control  which  minimizes  the  cost. 

Because  of  the  constraint  (3.1),  the  vector  6u  is  not  free. 

For  instance,  suppose  that  for  t between  t^  and  t2>  the  trajectory  £(t) 
due  to  the  optimal  control  £(t)  is  along  the  boundary  of  the  allow- 
able region;  see  Figure  3.  One  cannot  freely  choose  the  variation  5ja 
in  the  control  vector  for  _<  t £ and  still  expect  to  remain  in  the 
allowable  region  R. 

For  the  optimal  trajectory  £ and  control  £,  there  are  at  most  a 
finite  number  of  intervals  £ t £ tj^  + 1 such  that  equality  holds  for 
any  of  the  equations  in  (3.1).*  On  such  an  interval,  the  conditions 
(3.1)  can  be  split  into  two  sets 


*The  proof  of  the  statement  is  topological  and  beyond  the  scope  of 
these  notes. 
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R 


z(t) 

x(t)  = z + 6x 


Figure  3 — Constrained  Variables 


v)  “ 0 

and 

v)  < 0 (3.4) 


where  ^ ^2)- 

Consider  a new  vector  defined  by 

^(x,  u)  + ^(x,  ^)  = 0 

The  vector  ^ is  called  a defect  vector.  Along  the  optimal  trajectory, 
the  vector  ij/  can  also  be  split  into  two  component  vectors,  and 
which  correspond  to  the  component  vectors  of  The  component  vectors 
of  \(f  also  change  from  interval  to  interval.  Along  a given  interval 
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4 = 0 
> 0 


(3.5) 


Since  _v)  is  zero  on  this  interval,  either  v.>  or  both  are 

on  the  boundary  of  their  allowable  range.  From  previous  arguments,  it 
is  known  that  one  cannot  freely  choose  6u.  Only  those  values  of  are 
allowed  which  satisfy 


or  by  (3.4) 


+ 6x,  V + 6_u)  ^ 0 


+ 6x,  y_  4-  6u)  - v)  _<  0 


On  the  other  hand,  for  a neighboring  trajectory  to  z 


^ + 6xj  V + 6_u  4*  j];  + = 0 


‘^k+1^*  Since  ^ = - 


(})(^  4-  6x,  V 4-  6^)  - v;)  4-  6ij;  = 0 


(3.6) 


In  order  that  the  above  inequality  and  (3.6)  hold. 


6\j^^  > 0 (3.7) 

Moreover,  provided  the  variations  are  sufficiently  small,  is  free. 

If  ^ is  twice  continuously  differentiable,  then  it  follows  from 
(3.6)  that 

^ 6x  4-  ^ 6u  4-  = 0 (3.8) 
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Set  6u  = consider  the  first  equations  in  (3.8), 


N . = dim  • 


iix  % + + iiu2  *^^2  + = 0 


If  the  square  matrix  is  not  singular,  its  inverse  y exists,  and 


6u^  = - Y fiu2  - Y iix  ^ 

The  vectors  6u_2  and  63c  are  free;  the  vector  satisfies  (3.7).  If 

the  matrix  4-  is  singular,  the  first  N.  constraints  were  de- 
^^1  ^1 

pendent;  eliminate  the  dependent  constraints  and  start  again. 

The  contribution  to  the  cost  differential  (3.3)  from  the 
interval  j<  t £ following  integral: 

t. 


1 


•k+l 


itF,  - ?:  - £’4  - ^ix  -£'4/  4lx'  «£ 

+ - £'4^  - - £'4^)  r «£2 

- (F  - P ’ f ) Y 5ii  } dcj 
^ *^1 


Define  the  vector  by 


Ai 


- (F,^-  £*^)  Y 


(3.10) 


Then 


r*^k+l 


'k  = 


1 IF,  - £’  - £'4  + A[4i,l  4 


+ [(F„^  - £’4^)  + + XJS^)  do 
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The  vector  £ can  be  determined  so  that  the  coefficient  of  6x  vanishes: 


£’  = F - £’  4 + Aii; 


lx 


(3.11) 


Since  6^2  free,  the  usual  argument  that  6U2  is  zero  everywhere  except 
on  a small  interval  yields 


F - p’  f + =0 

U2  ^ -U2  -1  ^1U2 


(3.12) 


Now  6u  can  be  chosen  so  that  5u  * 0 for  t < t,  and  for  t,  . , < t. 

— — — k k+1  — 

In  this  case,  the  only  contribution  to  the  cost  difference  (3.3)  is  that 


due  to  I,  ; hence 
k 


r^k+i 

1 I,  - J 


do 


By  (3. 7) , ^ 0;  so 


X > 0 


(3.13) 


Let  the  Hamiltonian  be  defined  by 

H = - F + £’  £ - X'  ^ (3.1A) 

where  ^ is  defined  by 

X^  > 0 if  (})j  = 0 

X^  = 0 if  c|)^  < 0 
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The  differential  system  (1.28)  also  holds  for  this  H,  that  is. 


X = 3— 

- 9p 


3H 


9H 


(1.28) 


One  example  of  a constrained  control  problem  is  that  of  a forced 
harmonic  oscillator  in  which  the  magnitude  of  the  force  is  limited.  In 
this  problem,  the  force  is  the  control  and  the  process  is  one  of  chang- 
ing the  velocity  and  displacement  of  the  harmonic  oscillator.  It  be- 
comes an  optimal  control  problem  if  one  is  interested  in  finding  the 
force  or  control  which  reduces  the  oscillator  from  a given  velocity  and 
displacement  to  zero  velocity  and  displacement  in  minimum  time. 

The  equation  of  motion  for  the  forced  harmonic  oscillator  with  a 
limited  force  is  simply 


where  |f|  ^ M,  a given  constant.  Set  x = cz/M,  x = cot,  and  u * F/M 


where  the  control  function  satisfies  the  inequality  |u|  ^1.  This 
constraint  can  also  be  written  in  the  form 


ra  — 2 + cz  = F 


dt 


where  a)  = /c/M.  In  terms  of  these  nondlmenslonal  variables,  the  non- 
dimensional  form  of  the  equation  of  motion  is 


X + X = u 


(3.15) 
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(J)j^(u)  = (u  - 1)  £ 0 
4>2(u)  = - ( u + 1)  _<  0 


(3.16) 


The  optimal  control  problem  can  be  formulated  in  the  phase  plane. 

If  (x,  y)  are  the  phase  plane  coordinates,  the  equation  of  motion  (3.15) 
takes  the  form 


X = y 

y = - X + u 


(3.17) 


Starting  the  oscillator  at  a given  displacement  with  a given  velocity  is 
equivalent  to  assigning  a given  point  (x,  y)  = (a,  b)  in  the  phase  plane 
as  an  initial  condition  for  (3.17).  The  rest  state  of  the  oscillator  is 
represented  in  the  phase  plane  by  the  point  (0,  0),  the  point  of  zero 
displacement  and  velocity.  Hence,  the  optimal  time  control  problem  is 
one  of  finding  a control  u which  minimizes  the  time  between  states 
(a,  b)  and  (0,  0).  In  this  problem,  the  cost  is  given  by 


C^(u)  = T = 


(3.18) 


The  cost  function  F(t,  x,  u)  = 1. 

Set  £ = (p,  q).  Then  the  Hamiltonian  defined  by  (3.14)  is 

H = - 1 + py  + q(u  - x)  - X(u  - 1)  (u  + 1)  (3.19) 

and,  moreover,  (1.28)  takes  the  form 
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X 


y.  + u 


P 


0 = = Q “•  - 1)  - X(u  + 1) 


(3.20) 


Suppose  u is  an  optimal  control  which  reduces  the  oscillator  from 
the  state  (a,  b)  to  the  state  (0,  0)  in  the  minimal  time  T,  and  suppose 
|u|  <1  for  the  interval  £ T £ Suppose  q 0 on  £ T £ T^. 

By  (3.20),  q - 2Xu  = 0;  hence,  X ^ 0 on  (T^^,  T^) . A consequence  of 
X 0 is  that  4>  = 0;  hence,  if  q 0,  it  follows  that  |u(t)|  = 1 on  (T^,  T2) . 
In  other  words,  one  needs  to  look  only  for  the  optimal  control  among 
those  controls  for  which  |u(t)|  = 1. 

Now  |u|  =1  implies  u = + 1;  hence,  the  solution  of  (3.20)  is 
given  as: 


X + 1 = A sin  (t  + a) 


y = A cos  (t  + a) 


p = B sin  (t  + a^) 


q = B cos  ( X + a^) 


q = 2 X u 


(3.21) 
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Since  A ^ 0,  it  follows  from  the  last  of  these  equations  that  the  sign 
of  q is  the  same  as  the  sign  of  u.  Hence,  if  q changes  from  positive  to 
negative,  the  optimal  control  must  switch  from  +1  to  -1.  It  switches 
from  -1  to  +1  if  q changes  from  negative  to  positive. 

In  a neighborhood  of  the  origin,  the  optimal  trajectory  satisfies 

(x  + 1)^  + = 1 

Hence,  its  final  segment  is  either  on  the  circle  of  radius  1 about 

(-1,  0),  or  it  is  on  the  circle  of  radius  1 about  (1,  0);  see  Figure  4. 

Suppose  for  the  sake  of  argument  that  there  is  an  C > 0 such  that 

u(t)  = - 1 for  T - c ^ T < T.  The  last  segment  of  the  optimal  trajector 

2 2 

is  on  the  semicircle  {(x  + 1)  + y = 1,  0 ^ y}.. 

Between  (0,  0)  and  (-2,  0),  the  parameter  T would  change  along  this 
semicircle  by  the  amount  tt;  hence,  the  sign  of  q must  change  somewhere 
on  this  semicircle.  At  the  point  where  q changes  sign,  the  sign  of  u 
must  also  change,  and  u switches  from  -1  to  1.  The  optimal  path  continues 
backward  on  the  circle  of  radius  r^^  around  (1,  0)  until  either  (a,  b)  is 
reached  or  q changes  sign.  But  q does  not  change  sign  until  the  point 
is  reached  since  the  time  between  and  is  tt.  At  S2,  the  control 
would  switch  to  -1  and  the  optimal  trajectory  would  continue  back  on  the 
circle  of  radius  r2  around  (-1,  0) . This  process  is  continued  until  the 
point  (a,  b)  is  reached.  In  the  process,  one  switches  control  each  time 
one  of  the  following  semicircles  is  intercepted: 

[x  - (2n  - 1)]^  + y^  = 1,  y > 0,  n=0,l,2,...  (3.22) 

or 

[x  + (2n  - 1)]^  + y^  * 1,  y < 0 n=0,l,2,...  (3.23) 

The  curve  formed  by  these  semicircles  is  called  the  switching  curve;  see 
Figure  5. 
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Figure  4 — Optimal  Trajectory 


Figure  5 — Switching  Curve 


The  optimal  control  and  the  resulting  trajectory  in  the  phase  plane 
can  now  be  obtained  by  reversing  the  above  procedure.  If  (a,  b)  is 
above  the  switching  curve,  proceed  with  the  control  u * - 1.  The 
optimal  trajectory  will  be  along  the  circle 


in  the  direction  of  that  part  of  the  switching  curve  which  lies  to  the 
right  of  X = 0.  For  (a,  b)  on  the  switching  curve,  use  u = - 1 if 
X < 0 or  u = 1 if  X > 0.  If  (a,  b)  lies  below  the  switching  curve, 
start  with  u = 1 and  change  to  u = - 1 at  the  switching  curve. 

Change  the  sign  of  u at  each  intersection  with  the  switching  curve. 
When  u = 1,  the  optimal  trajectory  lies  on  a circle  with  center  at 
(1,  0);  when  u = - 1,  it  is  on  a circle  around  (-1,  0). 

Suppose  only  one  switch  in  u is  needed  to  reach  the  origin  from 
(a,  b) . Because  of  the  symmetry  of  the  problem  geometry  in  the  phase 
plane,  it  is  necessary  to  consider  only  those  cases  for  which  a = 1 
after  the  switch.  The  origin  is  then  approached  along  the  trajectory 


2 2 2 2 
(x  + 1)  + y = (a  + 1)  + b 


X 


1 “ cos  (T  - t) 


y = - sin  (T  - t) 


(3.24) 


2 2 

which  is  on  the  semicircle  {(x,  y)  | (x  - 1)  •+•  y = 1,  y ^ 0}  let 

Tg  be  the  time  at  which  the  switch  occurs.  The  optimal  trajectory 
for  T ^ is  given  by 


X = - 1 + A sin  (t  + a) 


y = A cos  (t  + a) 


(3.25) 
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where  A and  a are  constants  defined  by 

A sin  a = a + 1 
A cos  a = b 

By  (3.24)  and  (3.25),  the  switching  time  must  satisfy 

1 - cos  (T-t)  =1+A  sin  (x  + a) 
s s 

- sin  (T  - T ) = A cos  (x  + a) 
s s 

Elimination  of  x^  from  these  equations  yields  a relationship  between  the 
terminal  time  T and  the  initial  point  (a,  b),  namely, 

(a  + 1 + cos  T)^  + (b  + sin  T)^  = 4 (3.26) 

By  definition,  time  fronts  are  the  curves  which  connect  initial 
points  having  the  same  terminal  time  T.  Equation  (3.26)  can  be  used  to 
determine  the  time  fronts  for  T ^ tt.  If  T = 0,  the  time  front  is  simply 
the  origin;  if  there  are  no  switches  in  the  control,  the  initial  point 
is  an  endpoint  of  the  curve  connecting  all  initial  points  from  which  the 
origin  is  reached  with  one  switch  in  time  T.  More  than  one  switch  would 
require  T > tt.  From  (3.26),  the  time  fronts  for  0 < T ^ tt  are  segments 
of  the  circle  of  radius  2 around  the  point  (-1  - cos  T,  - sin  T) ; see 
Figure  6.  It  is  the  segment  of  the  circle  which  lies  above  the  switch- 
ing path.  At  the  switching  path,  the  time  front  is  tangent  to  the 
vertical  line  x = constant  for  x > 0;  at  the  opposite  end,  it  is  tangent 
to  the  switching  curve.  For  T = tt,  the  time  front  is  a circle  of  radius 
2 around  the  origin. 
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Figure  6 — Time  Fronts 
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STOCHASTIC  SYSTEMS 


Stochastic  control  theory  was  first  applied  in  this  country  at  the 
Massachusetts  Institute  of  Technology  during  World  War  II  to  synthesize 
fire  control  systems.  In  the  1960’s  it  was  applied  to  space  navigation, 
guidance,  and  orbit  determination  in  such  well-known  missions  as  Ranger, 
Mariner,  and  Apollo.  Applications  of  the  filtering  theory,  aspects  of 
control  theory  include  submarine  navigation,  fire  control,  aircraft 
navigation,  practical  schemes  for  detection  theory,  and  numerical  in- 
tegration. There  have  also  been  industrial  applications;  one  example 

involved  the  problem  of  basic  weight  control  in  the  manufacture  of 
1 

paper. 

The  filtering  and  prediction  theory  developed  by  Wiener  and  Kolmogorov 
forms  the  cornerstone  of  stochastic  control  theory.  It  provides  an 
estimate  of  the  signal  or  the  state  of  a process  on  the  basis  of  observa- 
tion of  the  signal  additively  corrupted  by  noise.  Unfortunately,  the 
Wiener-Kolmogorov  theory  cannot  be  applied  extensively  because  it  requires 
the  solution  of  the  Wiener-Hopf  integral  equation.  It  is  difficult  to 
obtain  closed  form  solutions  to  this  equation,  and  it  is  not  an  easy 

equation  to  solve  numerically. 

2 

Kalman  and  Bucy  give  a solution  to  the  filtering  problem  under 
weaker  assumptions  than  those  of  the  original  Wiener  problem.  Their 
solution  makes  it  possible  to  solve  prediction  and  filtering  problems 
recursively  and  is  ideally  suited  for  digital  computers.  Basically,  it 
can  be  viewed  as  an  algorithm  which,  given  the  observation  process, 
sequentially  computes  in  real  time  the  conditional  distribution  of  the 
signal  process.  The  estimated  state  of  the  process  is  given  as  the 
output  of  a linear  dynamical  system  driven  by  the  observations.  One 
determines  the  coefficients  for  the  dynamical  system  by  solving  an 
initial  value  problem  for  a differential  equation.  This  differential 
equation  is  easier  to  solve  than  the  Wiener-Hopf  equation. 

Our  attention  here  will  be  limited  to  linear  systems  with  quadratic 
cost  functions.  In  this  case  the  solution  of  the  optimal  control 
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problem  is  given  by  the  separation  theorem.^  The  solution  consists  of 
an  optimal  filter  for  estimating  the  state  of  the  system  from  the  ob- 
served data  and  a linear  feedback  of  the  estimated  state  of  the  system; 
see  Figure  7. 


Figure  7 — Stochastic  Control  System 


The  optimal  filter  is  the  Kalman-Bucy  filter,  which  will  be  dis- 
cussed in  detail  in  the  next  section;  the  linear  feedback  is  the  same  as 
would  be  obtained  if  the  state  of  the  system  could  be  measured  exactly 
and  if  there  were  no  randum  disturbances  in  the  system.  Thus,  the 
linear  feedback  can  be  determined  by  solving  a deterministic  problem. 
Because  of  time  limitations,  we  will  not  prove  but  merely  accept  the 
separation  theorem. 

One  objection  to  the  use  of  stochastic  control  theory  is  that  the 
process  to  which  the  theory  is  applied  may  not  be  random  but  merely 
irregular.  For  instance,  the  traffic  flow  on  the  Washington  Beltway  may 
not  be  truely  random  but  it  is  certainly  highly  irregular.  If  I need  to 
reach  Dullis  Airport  from  DTNSRDC  by  1 pm,  it  might  take  me  45  to  50 
minutes;  but  to  reach  the  airport  at  6 pm,  I would  have  to  allow  2 
hours.  The  reason  for  this  variation  in  lead  time  is  that  there  will  be 
bumper- to-bumper  traffic  on  the  Beltway  during  the  rush  hour  and  any 
accident  brings  this  traffic  to  a halt.  It  is  not  the  microscopic  but 
the  macroscopic  properties  of  the  traffic  flow  that  govern  our  lead  time 
estimate.  The  traffic  flow  could  be  analyzed  as  a stochastic  process; 
such  a model  would  be  acceptable  provided  it  predicted  the  macroscopic 
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properties  of  the  flow.  This  is  analogus  to  using  linear  models  in  the 
deterministic  case.  If  the  predictions  agree  with  the  experimental 
results,  the  linear  theory  is  said  to  be  good;  if  they  do  not,  then  the 
process  is  said  to  be  nonlinear.  In  using  a statistical  model,  one 
should  recognize  that  it  is  only  a model  and  not  the  actual  process,  and 
one  should  continually  strive  to  determine  the  accuracy  of  his  models. 

There  are  many  reasons  in  favor  of  applying  stochastic  theory.  The 
solution  of  the  stochastic  problem  may  be  possible  whereas  the  determin- 
istic theory  may  be  hopelessly  impossible.  In  many  problems  such  as 
that  of  traffic  flow,  one  may  not  be  interested  in  the  microscopic 
properties  but  merely  in  certain  macroscopic  properties.  In  the  control 
problem,  the  stochastic  model  distinguishes  between  open  and  closed 
looped  systems  but  the  deterministic  model  does  not.  Another  reason  for 
using  a stochastic  model  may  be  that  this  model  is  closer  to  the  physics 
of  the  actual  situation. 

In  any  case  the  purpose  of  this  section  is  to  lay  the  ground  work 
for  stochastic  control  theory.  Our  attention  will  be  focused  on  certain 
concepts  of  stochastic  processes  and  random  differential  equations. 

To  describe  a stochastic  process  rigorously  would  require  measure 

theory  and  a great  deal  more  time.  Our  approach  will  therefore  not  be 

rigorous,  but  hopefully  it  will  be  complete  enough  to  get  across  the 

9 

basic  ideas.  For  the  rigorous  approach,  see  either  Doob  or  Gikhman  and 
Skorokhod.^^ 

A real  random  variable  ^ is  a set  of  numbers  or  events  together 
with  a probability  measure  defined  on  this  set.  It  is  characterized  by 
its  distribution  function  F(x)  which  is  defined  by 

F(x)  * P {?  < x} 


9 

Doob,  J.  L. , "Stochastic  Processes,"  Wiley,  Inc.,  New  York  (1963). 

^^Gikhman,  I.  I.  and  A.  V.  Skorokhod,  "Introduction  to  the  Theory  of 
Random  Processes,"  W.  B.  Saunders  Company,  Philadelphia,  Pa.  (1969). 
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where  P ^ x}  is  the  probability  that  ^ is  less  than  or  equal  to  x. 

The  distribution  function  is  nonnegative,  nondecreasing,  and  continuous 
from  the  left;  also  F(-  »)  = 0 and  F(»)  = 1. 

Analogously,  if  ^ is  an  n-truple  of  random  variables,  its  distri- 
bution function  is  a function  of  n real  variables. 

F (x-  , x« , • • • , X ) *“  P ^ X-i  * • • • * ^ ^ } 

iz  n 1 — i n — n 

and  F is  called  a joint  distribution  function  of  the  variables  C,  • The 

k 

function  F(xj^,  uniquely  defined  in  n-dimensional  Euclidian 

space  E^,  is  non-decreasing,  and  is  continuous  from  the  left  with  respect 
to  each  variable.  Furthermore, 

F(Xj^,  X2 » • • • ^ ^i+2* " " " ^n^  ~ ^ 


and 


F(x 


1’ 


00)  = 


(X 


1’ 


Xi) 


where  denotes  the  distribution  function  of  the  i-truple  • 

A random  function  or  a stochastic  process  is  a random  variable  C(t) 
which  is  a function  of  time.  As  time  varies,  C(t)  describes  the  evolu- 
tion of  the  process.  If  a random  process  is  recorded  as  it  evolves,  the 
recorded  function  ^(*)  describes  only  one  of  the  many  possible  ways  in 
which  the  process  might  have  developed.  The  recorded  function  C(’)  is 
called  a sample  function  of  the  random  process.  For  each  fixed  value  of 
t,  the  quantity  ^(t)  is  a random  variable. 

Whereas  a random  variable  is  characterized  by  a distribution  function, 
a stochastic  process  is  characterized  by  a set  of  joint  distribution 
functions.  Assume  that  it  is  possible  to  assign  a probability  distribution 
to  the  multidimensional  random  variable  = (^(t^), 
for  any  n and  arbitrary  times  t^.  The  distribution  function 
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F y y • * • 9 y > • • • > 


t^)  = P £ X^, • . . , 


n — n 


is  called  the  finite-dimensional  distribution  of  the  stochastic  process 
5(t).  For  F to  be  a distribution,  it  must  satisfy  the  following  com- 
patibility conditions: 


F (^2^  y ^2  * * 


Xoo 

i > > 


tn>  - F(x^, 


^2  > • • 


'1’ 


for  i < n and 


F(x 


1’* 


- 2^  > • • • 


) 


where  arbitrary  permutation  of 

The  mean  value  of  a stochastic  process  is 


the  indicies  1, 
defined  by 


2 

^ , • • • , 


n. 


m(t)  = E[5(t)]  = 


F(x, 


t) 


where  E is  the  mathematical  expected  value.  The  mean  value  is  thus  a 
function  of  time.  Higher  moments  of  ^ are  defined  similarly. 

The  covariance  of  the  stochastic  process  is  given  by 


r(s,  t)  = cov  [C(t),  C(s)]  = E I(C(t)  - m(t))  (C(s)  - m(s))] 


(x  - m(t))  (y  - m(s))  d F(x,  y;  t,  s) 


Our  definition  of  a stochastic  process  is  very  general,  and 
most  systems  which  come  under  this  definition  would  be  mathematically 
unmanageable.  Some  specialization  of  the  theory  which  makes  it  possible 
to  characterize  the  distribution  of  ^(t^),  ^ simple 

way  are  particularly  attractive.  For  instance,  if  the  distribution  of 
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^(t j^) , • . . is  identical  to  the  distribution  of  + t), 

C(t2  + + t)  for  all  T and  all  arbitrary  choices  of  the 

times  t-,,..,t  , then  the  stochastic  process  C(t)  is  said  to  be  stationary. 

in  2 

If  only  the  first  and  second  moments  E[^]  and  E[^  ] of  the  distributions 

are  equal,  then  the  process  is  weakly  stationary. 

Our  discussion  of  control  systems  has  been  limited  to  systems  in 
which  knowledge  of  the  system  at  time  t together  with  the  governing 
equations  suffices  to  describe  its  future  evolution.  Knowledge  of  the 
past  when  the  present  is  given  is  superfluous  relative  to  the  future 
evolution  of  the  system.  The  stochastic  system  analogy  of  this  situation 
is  the  Markov  property  for  random  processes;  these  are  stochastic  process- 
es in  which  the  past  and  future  of  the  processes  are  conditionally 
independent.  In  order  to  define  a Markov  process,  the  conditional 
probability  and  the  transition  probabilities  have  to  be  defined.  The 
conditional  probability  P(a|b)  is  the  probability  that  A will  occur  if  B 
has  occurred.  Given  a sequence  of  times  t^  < t2  t^  < t,  the 

probability  that  ^(t)  ^ x if  the  sample  function  C(*)  has  already  taken 
the  values  C(t2>  , • • • » is  denoted  by  P(C(t)  _<  x | , . . . , 

^(t^)).  A stochastic  process  is  said  to  be  a Markov  process  if 

P(C(t)  < x|^(t^),...,  C(t^))  = P(C(t)  < x|C(t^)) 

The  transition  probability  distribution  F(x,  t|y,  s)  is  defined  by  F(x, 
t|y,  s)  = P(^(t)  £ x|^(s)  = y).  If  a stochastic  process  is  a Markov 
process,  its  finite  distribution  functions  are  given  by 

F(x^,  X2>  • • • > ^1’  ’ * ' ’ ^n^ 

F(x^;  t^)  F(x2,  t2|xj^,  tj^)...F(x^,  '^n-1^ 
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This  results  from  an  application  of  the  Baye  rule.  A Markov  process  is 
thus  defined  by  two  functions,  the  absolute  probability  distribution 
F(x,  t)  and  the  transition  probabilities  F(x,  t|y,  s) . 

Consider  a system  with  the  following  dynamic  equation: 

X = ^(t,  (t)  (A.l) 

where  ^ is  a small  parameter  and  w is  a stochastic  process.  Since  w is 
stochastic,  the  state  of  the  system  x will  also  be  stochastic;  thus,  we 
are  interested  in  solving  stochastic  differential  equations.  Further- 
more, our  interest  is  not  with  a particular  sample  function  x(*)  which 
is  a particular  discription  of  the  state  of  the  system  during  one  run 
through  the  process;  our  interest  is  with  the  statistical  properties  of 
the  stochastic  process  x (t). 

Consider  the  linear  stochastic  differential  equation 

dx  = A X dt  + dw  (^-2) 

where  w is  a stochastic  process.  In  order  to  make  some  progress  in 
finding  the  statistical  properties  of  x,  assume  that  w is  a Wiener 
process. 

A Wiener  process  is  a Markov  process  which  satisfies  the  following 
conditions: 

1.  It  is  a second  order  process;  that  is,  for  all  t 

E[w^(t>]  < » 

Hence,  the  mean  m(t)  exists  as  well  as  the  covariance  function 

r(s,  t)  = cov  [w(t),  w(s)] 
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2.  The  process  has  independent  increments;  that  is,  for  arbitrary 

times  t-  < t,,  < ...  < t , the  increments 
12  n’ 

x(t  ) - x(t  ),  x(t  ) - x(t  ),...,x(t-)  - x(t  ),  x(t  ) 
n n-i  n-1  n-2  I 11 

are  independent.* 

3.  The  distribution  of  x(t)  - x(s)  for  arbitrary  t and  s depends 
only  on  t-s.  In  this  case,  the  process  is  said  to  have  stationary 
increments. 

4.  The  transition  probabilities  are  Gaussian.  In  the  one- 
dimensional case,  the  transition  probability  density  is 

1 2 

p(t  + At,  w|t,  0)  = exp  - w /2At 

5.  w(0)  = 0 with  probability  one,  and  E[w(t)]  = 0 for  all  t > 0. 
Sample  functions  of  a Wiener  process  have  interesting  properties. 

They  can  be  continuous  functions  but  are  nowhere  differentiable.  Their 
paths  are  of  infinite  length.  Yet  it  is  for  just  such  perturbations 
that  (4.2)  will  be  solved. 

If  w in  (4.2)  had  bounded  variation,  the  solution  could  be  written 
in  terms  of  the  transport  matrix  4>(x,  t)  of  the  linear  system 

y = A y (4.3) 


The  solution  of  (4.2)  would  be 

x(t)  * 4>(t,  0)  c + I <Kt,  t)  d w(t)  (4.4) 

0 

where  the  value  of  x at  t = 0 is  the  random  variable  c.  The  expectation 
of  c is  m and  its  covariance  matrix  is  F. 


*Independent  random  variables  are  defined  on  page  7 of  Doob. 
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The  integral 


J' 


4>(t,  t)  d w(t) 


is  a stochastic  integral.  Since  the  transport  matrix  $(t,  x)  is 
deterministic  and  has  continuous  derivatives,  one  way  of  defining 
this  integral  is  through  integration  by  parts. 


r 


$(t,  t)  d w(t)  = $(t,  t)  w(t)  - $(t,  0)  w(0) 


- J' 


3t 


(t,  t)  w(t)  dT 


It  follows  from  (1.15)  and  other  properties  of  the  transport  matrix 
that 


J' 


<^(t. 


t)  d w(t)  = w(t)  - $(t,  0)  w(0)  + 


$(t,  t)  A(t)  w(t)  dT 
(4.5) 


The  integral  on  the  right  exist  for  almost  all  sample  functions  since 
the  sample  functions  of  w(t)  are  almost  all  continuous.  This  way  of 
defining  the  integral  has  the  desirable  feature  that  the  integral  can 
be  interpreted  as  an  integral  of  sample  functions.  It  does  not,  how- 
ever, preserve  the  intuitive  idea  that  the  integral  is  a limit  of  sums 
of  independent  random  variables  nor  can  it  be  extended  to  the  case 
where  <I>  is  stochastic.  Doob  gives  a more  formal  definition  of  the 
integral  together  with  detailed  proofs  of  its  stochastic  properties. 

The  expected  value  of  this  integral  is  computed  as  follows: 


48 


$(t,  t)  d w(t)  = E[w(t)]  - ^>(t,  0)  E[w(0)] 


<I>(t,  t)  A(t)  w(t)  dT 


= m(t)  - ^>(t,  0)  m(0)  + 


<I>(t,  t)  A(t)  m(T)  dT 


Hence 


, t)  d w(t) 


t)  d m(T)  (^-6) 


The  properties  of  the  solution  of  the  stochastic  differential 
equation  (A. A)  will  now  be  investigated.  Since  x is  a linear  function 
of  a normal  process,  it  is  also  normal  and  can  be  characterized  com- 
pletely by  the  mean  value  function  and  the  covariance  function.  Since 
the  expected  value  of  the  Wiener  process  w(t)  is  zero. 


E[x(t)]  = 4>(t,  0)  E[c]  + E 


= $(t,  0)  m^ 


where  is  the  expected  value  of  the  initial  condition  c.  Hence 


nix(t)  = E[x(t)]  = 4>(t,  0)  m^ 


(A. 7) 


Taking  derivatives  yields 


dm 


dT  ' ft  '"o  ""x 


A9 


Thus  the  mean  value  satisfies  the  linear  differential  Equation  (A. 3). 

The  covariance  matrix  is  more  difficult  to  compute.  In  order  to 
simplify  the  calculations,  assume  m^  = 0;  hence,  E[x(t)]  = 0.  This  can 
always  be  achieved  by  subtracting  m^  from  x.  For  s ^ t, 

R(s,  t)  = cov  [x(s),  x(t)]  = E[x(s)  x^(t)] 


1 

1 .r 

= E 

$(s,  t)  x(t)  + J $(s,  0)  d w(0) J 

1 21  (t) 

= $(s,  t)  E[x(t)  x’’^(t)]  + J <Ks,  a)  E[d  w(0)  x'^(t)] 

t 

= <Ks,  t)  R(t,  t)  (4.9) 

The  integral  is  zero  since  w(o)  and  2i(t)  independent  for  s ^ t. 

T 

Set  P(t)  = R(t,  t)  = E[x(t)  X (t)].  Then  P(t)  is  the  variance  and  is 
therefore  the  function  of  interest. 


P(t)  = E 


$(t,  0)  c + 


J 4>(t,  T)  d u(T)^ 


^ J' 


+ ^ 0)  c + J $(t,  o)  d w(o) 


= $(t,  0)  E[c  c^]  0) 


+ $(t,  0)  E 


fd„ 


T T 

^0)  0) 


^ 1' 


$(C,  T)  E[d  w(t)  c 


0) 


J J $(t,  t)  E[d  w(t)  d W^(0)]  $^(t,  0) 


0 0 
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The  increments  of  the  Wiener  process  are  independent  of  C;  hence 


E [c  d^  w(o)]  = E [d  w(t)  c^]  = 0 


Moreover,  from  the  properties  of  the  Wiener  process 


E [d  w(T)  d w (a)]  = 0 


if  dx  and  do  have  no  parts  in  common;  otherwise 


E [d  w(T)  d w (x)]  = R dx 

w 


where  is  the  covariance  matrix  of  the  Wiener  process  w.  The  final 


expression  for  P is  then 


P(t)  = 4>(t,  0)  r 4>'(t 


, 0)  + j 4>(t,  X) 


(X)  <I>"(t,  X)  dx 
(4.10) 


A differential  equation  for  P can  be  obtained  from  this  expression 
for  P simply  by  differentiating 


dt 


- <Kt,  o)J  r 4>'^(t,  0)  + <i>(t,  0)  r ^ <D^(t,  0) 

+ <Kt,  t)  R^(t)  <f'^(t,  t)  + J ^ 


dT 


+ J 4>(r,  X)  R^(X)  ^ $^(t,  x) 


dx 
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The  transport  matrix  satisfies 


and 


(t,  T)  = A <Kt,  T) 


^^4^^  - T)  a" 


Hence 


^ = A <D(t,  0)  r $^(t,  0)  + <i>(t,  0)  r $'^(t,  0) 


+ R (t)  + J A <I>(t,  T)  R (T)  $’'(t,  T) 

W w 


dx 


+ I <l>(t,  T)  R„(t)  4>^(t,  T)  A^  dT 
w 


^ = A I $(t,  0)  r $^(t,  0)  + J 4>(t,  t)  R.,(T)  4>^(t,  T)  dT 


w 


<i>(t,  0)  r (t,  0)  + 


j\u, 


T)  T)  dT  1 A^ 


+ R (t) 
w 


Thus  from  (4.10) 


dP  T 

■^=AP  + PA  +R 
dt  w 


p(0)  = r 


(4.11) 

(4.12) 
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THE  KALMAN-BUCY  FILTER 

The  solution  of  the  optimal  control  problem  for  a linear  stochastic 
system  is  given  by  the  separation  theorem.  It  consists  of  an  optimal 
filter  for  estimating  the  state  of  the  system  from  the  observed  data  and 
a linear  feedback  of  the  estimated  state  of  the  system;  see  Figure  7. 

The  linear  feedback  is  the  same  as  the  feedback  that  would  be  obtained 
if  there  were  no  stochastic  perturbation  of  the  system.  This  section 
will  develop  the  explicit  computational  schemes  for  solving  the  filter- 
ing problem. 

Suppose  we  have  the  stochastic  process  described  in  the  previous 
section 

dx  = A X dt  + d w(t)  (5.1) 

x(0)  = c (5.2) 

where  w(t)  is  a Wiener  process  and  c is  a Gaussion  zero  mean  n-vector. 

In  an  actual  case  in  which  the  process  is  realized,  it  is  important  to 
know  the  state  of  the  system.  It  is,  however,  not  always  possible  to 
measure  x directly;  instead,  a set  of  quantities  z(t)  dependent  on  x are 
measured.  Assume  that  the  dependence  of  z on  x is  linear  and  is  given 
by 

dz  = H X dt  + dv  (5.3) 

where  the  perturbation  v is  a Wiener  process  independent  of  x. 

The  filter  problem  can  be  formulated  as  follows.  Assume  that  a 
realization  of  the  output  z has  been  observed  over  the  interval 
0 < X < t.  Determine  the  best  estimate  of  the  value  of  the  state  vector 
X at  time  t.  It  is  assumed  here  that  the  admissible  estimates  of  x are 
linear  functionals  F(z)  of  the  observed  output  z.  The  criterion 
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for  determining  the  best  estimate  is  that  the  mean  square  estimation 
error  be  a minimum.  This  best  estimate  x(t)  is  dependent  on  the  values 
of  z(t)  in  the  interval  0 _<  l ^ t,  and  it  can  be  proved  that  it  is  a 
linear  combination  of  the  values  of  z on  this  interval. 


Since  z(x)  is  a stochastic  variable,  x(t)  is  a stochastic  integral. 

Interpolation  and  extrapolation  are  two  problems  that  are  related 
to  the  filtering  problem.  The  interpolation  problem  is  one  of  estimating 
the  state  at  some  time  T < t;  the  extrapolation  problem  is  one  of  esti- 
mating it  at  some  time  x > t.  This  latter  problem  is  the  one  which  is 
of  interest  to  the  stock  market  investor. 

The  condition  that  x(t)  is  the  best  estimate  from  among  all  linear 
functionals  of  z(t)  for  the  state  vector  x in  the  least  squares  sense  is 
stated  mathematically  as  follows.  For  every  constant  vector  X and 
linear  functional  F, 


0 


(5.4) 


E[{x'^(x(t)  - 5(t))}^]  < E[{A^(x(t)  - F(z))}^]  (5.5) 


where  all  variables  have  a zero  mean. 


E[x(t)]  = E[x(t)]  - E[F(z)]  = 0 


Now  set 


X * X - X 


where  x is  called  the  minimum  error  vector. 


E[(X^  x)^]  < E[X^(J  + (F(z)  - x))^] 


< E[(X”^  x)^]  + 2E[X’’^  X X^  (F(z)  - x)  ] 


+ E[(X’^  (F(z)  - x))^] 
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For  all  X and  F(z),  the  criterion  (5.5)  requires 


E[(X’^  (F(z)  - x))^]  + 2E[X'^  X (F(z)  - x)  ] > 0 


This  can  be  true  only  if 

0 = Eix"^  X X^  (F(z)  - x)]  = X^  E[x(F(z)  - x)^]X 


But  this  implies  that 


E[x  (F(z)  - k)'^]  = 0 


for  any  linear  combination  F(z)  of  elements  of  z;  hence 


E[x  F*^(z)  ] = 0 


(5.6) 


An  integral  equation  for  the  kernel  K(t,  t)  can  be  derived  from 
(5.6).  This  kernel  is  not  a stochastic  quantity,  and  it  can  be  de- 
termined independent  of  the  realization  z(*).  For  F(z)  = z(t)  - 
z(o),  0 ^ £ T ^ t,  the  expression  (5.6)  yields 

E[x(t)  (z(T)  - Z(o))'^]  = E[x(t)  (z(T)  - z(o))''^] 


= E 


= E 


= E 


x(t)  ^ J H(s)  x(s)  ds  + d v(s)j 


0 

.t  .T 


K(t,  r)  dz(r)  | | J H(s)  x(s)  ds  + dv(s) 


J J K(t,  r)  (H(r)  x(r)  dr  + dv(r))  (H(s)  x(s)  ds  + dv(s)) 


0 a 


= E 


.c  ,T 


j*  J K.(t,  r)  H(r)  x(r)  x^(s)  H^(s)  ds  dr 


0 a 
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t j: 


+ j I «t. 


r)  H(r)  x(r)  dv  (s)  dr 


0 a 


.t 


+ J J K(t,  r)  dv(r)  x^(s)  h'^(s)  ds 


0 a 


t 


^ 1 I 


K(t,  r)  dv(r)  dv  (s) 


0 a 


From  the  properties  of  Wiener  processes, 


I / K(t,  r)  dv(r)  dv^(s)  = J K(t,  s)  R^(s)  ds 


0 O 


where  R is  the  covariance  matrix  of  the  process.  Furthermore, 

^ T 

dv(s)  and  x(s)  are  independent,  so  E dv(r)  x (s)  = 0;  hence 

E[x(t)  (z(T)  - z(0))^]  = J I J K(t,  r)  H(r)  E[x(r)  x^(s) 

a [ 0 


(5 


H (s)  dr  + K(t,  s)  R^(s) 


for  all  O and  T.  On  the  other  hand,  from  (5.3) 


ds 


E[x(t)  (z(t)  - z(a))  ] = E 


x(t)  I J 

L ^ o 


H(s)  x(s)  ds  + dv(s) 


J E[x(t)  x^(s)]  H^(s)  ds 


■ I'  I f 


K(t,  r)  H(r)  E[x(r)  x^(s)]  H^(s 


+ K(t,  s)  R^(s)  \ ds 


] 

.7) 


) dr 
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where  the  last  equality  results  from  (5.7).  Since  this  equation  holds 
for  all  a and  t in  the  interval  [0,  t]. 


K(t,  s)  R (s)  = E[x(t)  x^(s)]  H^(s)  - J K(t,  r)  H(r)  E[x(r)  x’^(s)]  H^(s)  dr 

0 

(5.8) 

This  is  a nonhomogeneous  integral  equation  for  K(t,  s).  Its  kernel 
T T 

is  H(r)  E[x(r)  x (s)]  H (s) . Since  it  corresponds  to  a positive  definite 

quadratic  form,  all  its  eigenvalues  are  positive  and  the  equation  has  a 

solution.  Unfortunately,  it  is  not  possible  to  calculate  K(t,  s)  from 

T 

this  equation  because  E[x(r)  x (s)],  the  covariance  of  x(s),  is  unknown. 

A different  equation  for  K(t,  s)  can  be  obtained  from  (5.8)  by 
differentiating  both  sides  of  it  with  respect  to  t. 

^ “ ft  x^(s)]  H^(s) 

- K(t,  t)  H(t)  E(x(t)  x^(s)]  H^(s) 

^ E[x(r)  x^(s)]  H^(s)  dr 

dx  = Ax  dt  + dw 


- 


By  (5.1) 


Hence 


E[dx(t)  x'^^(s)]  h'”^(s)  = A(t)  E[x(t)  x^(s)]  h'^(s)  dt 

+ E[dw(t)  x^(s)] 
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The  second  term  vanishes  since  dw(t)  and  x (s)  are  independent  if 
t ^ s.  This  yields 

Ry(s)  = [A(t)  - K(t,  t)  H(t)]  E[x(t)  x^(s)]  H^(s) 


- r 


(t,  r)  H(r)  E[x(r)  x^(s)]  H^(s)  dr 


Use  of  the  integral  equation  (5.8)  to  obtain  an  expression  for 
E[x(t)  x^(s)]  h"^(s)  yields 


H(r)  E[x(r)  x^(s)]  H^(s)  dr 


(5.9) 


Set 


l(t,  s)  = - - + K(t,  t)  H(t)  K(t,  s)  - A(t)  K(t,  s) 


Then  by  (5.9) 


'J^(t,  s) 


- J% 


tj^(t,  r)  H(r)  E[x(r)  x^(s)]  H^(s)  R^^(s)  dr 

(5.10) 


Since  the  kernel  of  this  integral  equation  corresponds  to  a positive 
definite  quadratic  form,  the  only  solution  of  (5.10)  is 


i|<(t,  s)  = 0 
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This  yields  the  following  differential  equation  for  K(t,  s): 


From  the  integral  equation  (5.8)  for  K(t,  s) 


K(t,  t)  R^(t)  = E[x(t)  x’^(t)]  H'^(t) 


- Tk.. 


r)  H(r)  E[x(r)  x^(t)]  H^(t)  dr 


On  the  other  hand, 


E[x(t)  X (t) ] = E 


K(t,  r)  dz(r)  x (t) 


1'  KU. 


r)  H(r)  E[x(r)  x (t)]  dr 


J' 


+ J K(t,  r)  E[dv(r)  x (t)] 
0 


where  the  second  integral  vanishes;  hence 

K(t,  t)  R^(t)  = E[x(t)  x'^(t)]  H'^(t)  - E[x(t)  x'^(t)]  H^(t) 
= E[(x(t)  - ^^(t))  x^(t)]  H^(t) 

= {E[x(t)  x^(t)]  + E[x(t)  x^(t)]}  H^(t) 
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The  condition  that  x is  the  best  estimate  from  among  all  linear  func- 

^ T 

tionals  of  z(t)  leads  to  the  result  that  E[x(t)  x (t)]  = 0.  Hence 

P(t)  H^(t)  - E[S(t)  ^^(t)]  H^(t)  = K(t,  t)  R^(t)  (5.12) 

From  the  stochastic  integral, 


x(t)  = J K(t,  r)  dz(r) 

0 

dx(t)  = K(t,  t)  dz(t)  + J - dz(r)  dt 

= P h'^  dz(t)  + I (A(t)  K(t,  r)  - K(t,  t)  H(t)  K(t,  r))  dz(r)  dt 

0 

or 

dx(t)  = A(t)  x(t)  dt  + P R^^(dz(t)  - H(t)  x(t)  dt)  (5.13) 

Since  z(t)  and  presumably  dz(t)  are  known,  this  is  a stochastic  differ- 
ential equation  for  x(t). 

Note  that 

dz(t)  - H(t)  x(t)  dt  = dz(t)  - H(t)  x(t)  dt  + H(t)  x(t)  dt 

= dv(t)  + H(t)  x(t)  dt 


From  this  expression  and  (5.13),  we  get  the  following  stochastic 
differential  equation  for  x 
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dx  = dx(t)  - dx(t) 


=Axdt+dw-Axdt-P  (dv  + H(t)  x) 

» A X dt  + dw  - P R”^  dv  - P H^R~^  H x dt 

V V 

- [A  - P r”^  H]  X dt  + dw  - P R”^  dv 

V V 

with  x(0)  * x(0).  By  the  methods  developed  in  the  previous  section 
for  stochastic  differential  equations, 

P(t)  = E[X(t)  i^(t)] 

= <I)(t,  0)  r 0)  + J <{)(t,  a)  [Q(o) 

0 

+ P(a)  H^(a)  R~^(o)  H(a)  P^]  4>^(t,  a)  da  (5.14) 

where  ^>(t,  i)  is  the  transport  matrix  associated  with  the  linear 
differential  equation 

= (A  - P r”^  H)  y (5.15) 

dt  V ’’ 

and  where 

Q dx  = E[dw(x)  dw^(x)]  (5.16) 

from,  (4.11) 

dP  T -1  T -1  T 

= (A  - P H R H)  P + P(A  - P H R H) 
dt  V V 

+ Q(t)  + P R"^  H P^ 

V 
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or 


dP  T T -1 

^=AP  + PA  -PH  R HP-Q 
dt  V 


(5.17) 


P(0)  = r (5.18) 

This  set  of  equations  finishes  the  solution  of  the  filter  problem. 

The  optimal  filter  is  a feedback  system  which  is  described  by  the 

stochastic  differential  equation  (5.13).  It  is  obtained  by  taking  the 

measurements  z(t),  forming  the  error  signal  z(t)  - H(t)  x(t),  and  feed- 

T -1 

ing  the  error  forward  with  a gain  P(t)  H (t)  (t).  P(t),  the  error 

variance,  is  obtained  as  a solution  to  the  nonlinear  Riccati-type 
equation  (5.17),  H(t)  is  a known  transformation  matrix,  and  R is 

V 

the  variance  of  the  Wiener  process  dv.  A block  diagram  of  the  filter  is 
shown  in  Figure  8.  The  variables  appearing  in  this  diagram  are  vectors, 
and  the  boxes  represent  matrices  operating  on  vectors.  The  double  lines 
which  indicate  direction  of  signal  flow  serve  as  a reminder  that  multiple 
signals  rather  than  a single  one  are  being  directed. 


Figure  8 — Optimal  Filter 
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